Textual Data Mining through the Synergistic Combination of Classifiers and Linguistic Processors
نویسنده
چکیده
Numerical data mining tools are generally quite robust but only provide coarse-granularity results; such tools can handle very large inputs. Computational linguistic tools are able to provide fine-granularity results but are less robust; such tools, often semi-automatic, usually handle relatively short inputs. A synergistic combination of both types of tools is the basis of our hybrid approach. First, a connectionist classifier is used to locate potentially interesting documents, or segments thereof. Second, the user selects segments that will be forwarded to the linguistic processor in order to semi-automatically analyse their textual data and extract relevant information or knowledge elements. We present the main characteristics of our hybrid approach to textual data mining, plus a methodology by which it can be put to use. We also report on the results of a first evaluation involving a corpus made up of two texts pertaining to two
منابع مشابه
Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملTextual Enhancement across Linguistic Structures: EFL Learners' Acquisition of English Forms
The benefits of textual input enhancement in the acquisition of linguistic forms have produced mixed results in SLA literature. The present study investigates the effects of textual enhancement on adult foreign language intake of two English linguistic forms-subjunctive mood and inversion structures-to explore the role of the type of linguistic items in input enhancement studies. It also invest...
متن کاملInvestigating Discourse Socialisation Progress of an English as a Second Language Learner Using Systematic Functional Linguistic Approach
This study was framed on the theory of Language Socialisation and a Systematic Functional Linguistic (SFL) approach. The aim of the study was to analyse the oral presentation discourse produced by an elemen- tary Iranian English as Second Language (ESL) postgraduate student in an American university four times (September/December, 2015 and March/September, 2016) over one year. The data were col...
متن کاملEmotion Modeling from Writer/Reader Perspectives Using a Microblog Dataset
Most recent studies on emotion analysis and detection focus on how writers express their emotions through textual information. In this paper, we model emotion generation on the Plurk microblogging platform from both writer and reader perspectives. Support Vector Machine (SVM)-based classifiers are used for emotion prediction. To better model emotion generation on such a social network, three ty...
متن کاملارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کامل